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Abstract 



Preliminary research of virtual reality suggests that this technology 
could be a powerful tool for education based on its immersive and dynamic 
attributes. The Virtual Reality Roving Vehicles (VRRV) Project at the 
University of Washington is exploring these possibilities by taking virtual 
reality equipment into schools for students and teachers to experience, and 
build worlds. Determining the educational efficacy of VR requires developing 
appropriate and meaningful forms of assessing this new mode of learning. 
The question of how to assess VR is particularly significant because it 
exemplifies the broader, theoretical conflict between traditional and 
constructivist learning approaches. 

This report presents an example of how the VRRV Project is using VR 
in schools, and identifies significant factors for assessment. The issue of test • 
reliability versus validity is addressed both in terms of general education, and 
specifically in using VR. The underlying psychological theories of 
information processing and constructivism are discussed in terms of 
developing a comprehensive paradigm to guide the application and research 
of VR. This discussion is followed by an overview of specific approaches for 
measuring learning in VR, along with hints and cautions about conducting 
educational assessment. 
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1. INTRODUCTION: Bringing VR into Schools 

The Virtual Reality Roving Vehicles (VRRV) Project takes VR 
technology into public elementary, junior high and high schools and puts it 
in the hands of students and teachers. Our goal is to evaluate VR as a tool for 
students to develop broad-based abilities including, but not limited to: 
problem solving, building mental models, developing effective meta- 
cognitive strategies and visualization. The VRRV is applying a 
'constructivist' approach to instruction which puts each student in charge of 
their own process of learning. In the constructivist model, the teacher's role is 
to "support the constructive activities of the learning so that [students'] efforts 
at constructing understanding— using our cognitive tools— become transparent 
or ready-at-hand." (Winograd and Flores 1986). Our research mission is to test 
VR as a medium for making the teaching process "transparent", so students ■ 
can focus on content rather than falter with the mechanics of instruction. 

It is important to ground the discussion of assessment to the VRRV's 
process of introducing this technology into schools. Before moving ahead, let 
us look at a sample scenario of how VR is being implemented for this 
research. In November 1994, the VRRV undertook a month-long world 
building project with 120 junior high school students at Kellogg Middle 
School in Shoreline, Washington. The Kellogg Project integrated the building 
of virtual worlds into a specially designed curriculum about wetlands ecology. 
Four classes of thirty students participated; each one was randomly assigned 
to focus on one of the wetlands life cycles: water, carbon, energy and nitrogen. 
Students learned the fundamentals of their respective cycle according to a 
constructivist curriculum designed by Kellogg teachers. Each class was then 
divided into three working groups who each planned and designed a virtual 
world to express their understanding of the wetlands cycle they studied. 
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The contributions of the three working groups in each class were 
brought together and a single virtual world was constructed for each of the 
four life cycles. The virtual wetlands worlds were populated with plants, 
animals, objects and landscapes which students created on desktop computers 
using 3D modeling software. As the final step of the learning process, 
students put on a VR head mounted display and experienced two of the 
wetlands worlds, their own plus one other. 

The Nitrogen-Cycle World was the most complicated of the four. In 
this world, students physically manipulated objects in the virtual world and 
acted out the cycle of nitrification and denitrification as it occurs in a 
wetlands. Students took free nitrogen, represented by a yellow ball, and placed 
it in a lightening cloud to demonstrate one way nitrogen is fixed in the 
atmosphere. The nitrogen then transformed into a fixed nitrogen molecule, - 
represented in the virtual world as a yellow ball orbited by four smaller balls. 

Students then flew down to the surface of the wetlands and crossed 
free- nitrogen with a nitrifying bacteria to fix nitrogen into the soil. The fixed 
nitrogen emerged within a patch of duckweed to signify the next step in the 
cycle. The student then picked up a nearby duck and' touched it to ("fed it") 
the duckweed. Immediately, duck droppings and a dead duck appeared on the 
wetlands shore to indicate the next step along the path for the nitrogen. A 
denitrifying bacteria (blue ball) also appeared for the student to contact with 
the decaying matter and release free-nitrogen back into the system to start the 
process all over again. 

As this scenario describes, the process of incorporating VR into the 
school environment is highly complex and involves human, instructional, 
and environmental factors . Unraveling these interwoven factors poses a 
challenge for conducting assessment. A cohesive paradigm to guide 
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assessment does not exist at this time: One must be created from existing 
theories of educational assessment, human-computer interaction, and 
psychology. Considering the substantial financial and human resource 
investment which may be required to implement VR in schools, 
comprehensive and accurate assessment of its virtues and weaknesses is 
crucial in defining the proper role for this technology. This report endeavors 
to define some parameters and methods for assessing learning with VR, 
towards the goal of creating a solid theoretical foundation to guide future 
research and implementation. 

2. THE VRRV APPROACH TO ASSESSING LEARNING 

The question of how to assess learning using VR is significant because 
it establishes a scale of relative efficacy for the technology, and also sets the • 
role VR will play in the overall context of education. Preliminary research at 
the Human Interface Technology Laboratory at the University of Washington 
(Bricken and Byrne, 1993) and elsewhere (Loftin, Engelberg, & Benedetti, 1993; 
Regian & Shebilske, 1992; Moshell & Hughes, 1994) gives us an intuitive 
sense that VR could be highly useful to promote skills and knowledge which 
students can apply across many domains. The interactive and immersive 
qualities of VR suggest the potential for an entirely new form of experiential 
learning. 

The instructional model which designates students as passive 
recipients of declarative knowledge presented in tidy packets has been widely 
criticized for yielding fragmented and unintegrated learning. Instruction or 
assessment which is too narrowly focused cannot see the forest for the trees. 
Glaser (1990) expresses how such fragmentation is especially pronounced in 
higher cognitive areas such as problem solving. 



VRRV Assessment 



4 



The danger of fragmentation is that an isolated focus on certain 
aspects of performance may underlie the frequent findings that 
students can solve problems but have little ability to explain the 
underlying principles and that those who can recite or even explain 
the principles are sometimes unable to recognize the conditions of 
applicability or to manage the requisite procedures efficiently. A 
major instructional research task is to design programs that test 
approaches to the integration of competent performance, and 
perhaps the most successful approach will be able to test a mix of 
instructional principles.. ..Attempts at integration promise to provide 
new grounds for the development of a more encompassing theory of 
learning. 



(Glaser, 1990, p.37) 

VR may perhaps give us the opportunity for robust integration, but we must 
first address the difficult tasks of defining the range of competent 
performance, and developing assessment methods to adequately measure 
that performance. 

The newness and breadth of the topic of VR can present an obstacle to 
discussion. Ackerman (1994) describes five leverage points as a basis for 
discussion and research of VR in education. Her five points include: 
transformation as the world reacts to actions by the user, the qualities of 
immersion and point of view, issues of realism or verisimilitude, the sensual 
engagement of perceptual and symbolic modalities, and the factor of locus of 
control. While these points are all important, Ackerman's distinctions still 
mix together factors of instruction with factors of learning which is 
inconvenient for discussion of assessment. 

For the purpose of the VRRV Project, we have broken our analysis into 
three categories for assessment: (I) instructional factors, (II) virtual 
environment experience factors, and (III) external factors. Certain aspects of 
each of these categories are certain to affect each other (figure 1 ): This 
interplay must be addressed in order to assess efficacy under real world 
conditions. 
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Figure 1 ; Assessment Factors 




I. Instructional factors 

A major research objective is to determine how instruction leading up 
to and accompanying the students' VR experience influences learning 
outcomes. Assessment of instructional factors looks at how all aspects of the 
learning environment outside of the head mounted display affect the 
learning process. Instruction during the world building process, which takes 
place almost entirely outside of the virtual environment, is one major focus 
of assessment. 

The process of building virtual worlds exemplifies the constructivist 
paradigm of knowledge being formed within the individual through 
interaction with the world. Rather than passively receiving information, 
students can use VR to construct their understanding of the knowledge 
domain. When children build virtual worlds they are simultaneously 
structuring their own mental models. Therefore the objects and interactions 
contained within the world arc a direct reflection of the learners' mental 
models and symbolic representations. Assessment of the world-building 
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process should take account of how students develop their understanding of 
the content; how understanding is manifest in the world, and also the quality 
of the final product. 

In the above example of the Nitrogen World, instructional variables 
include the approach to teaching the background knowledge on wetlands 
cycles which prepared students to build their worlds, the teaching during the 
world building process, and the level of guidance which students received as 
they acted out the nitrogen cycle. 

II. Virtual environment experience factors 

This category includes the students' experiences and activities while 
immersed in a virtual world. VRRV assessment focuses on the quality of 
human-computer intei action, the educational efficacy of various hardware * 
and software interfaces, comparison of world designs, and the physical 
sensation of presence. In the case of the Kellogg project, the worlds could 
have been created using different objects, types of interaction, or forms of 
instruction built into the world. How will such changes to the interface and 
experience of VR affect learning outcomes? 

"The experience in which an idea is embedded is critical to the 
individual's understanding of and ability to use that idea." (Duffy & Jonassen, 
1992, p. 4) In other words, experience is a vehicle for knowledge creation and 
also recall. Students can experience VR to build their understanding from the 
ground up. Winn (1993) suggests that VR can give students a physical and 
intuitive understanding of abstract concepts prior to tackling symbolic 
representations of the domain. The key to developing intuitive 
understanding lies in the interactive nature of VR, but care must be taken to 
avoid misconceptions based on incorrect intuition. 
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Our research targets a number of important questions regarding how 
different forms of interaction impact the quality of learning in VR. How do a 
broad age range of children respond to virtual interfaces? How much learner 
control of the virtual environment is optimal? If guidance is to be given to 
the student, should it take place in the virtual environment using an avatar 
or animated guide, for example? Taking the example of the Nitrogen-Cycle 
World, was it the physical act of placing nitrogen in a cloud which helped 
students understand and remember the concept, or would a passive 
experience of the interaction be equally as effective? 

Another assessment area examines the effect of various forms of 
feedback to support and guide the user. How should a virtual world react to 
student interactions? Winn (1987, 1992, 1993) and Winn and Bricken (1992) 
suggest the importance of dynamic feedback in virtual worlds to support 
learning. Winn (1992) suggests that virtual worlds can be imbued with the 
ability to support students construction of meaning. Thus it is important to 
study the relative effectiveness of various modes of feedback. In addition, the 
level at which students rely on feedback can also be an assessment measure of 
performance. In other words, the more competence a student develops as she 
moves from novice to expert within a content domain, the less the student 
will rely on feedback for guidance. 

Winn (1993) suggests that the greatest educational benefit of VR is its 
spatial qualities of being immersed in another reality. This feeling has come 
to be referred to as presence by VR researchers, even though a clear method 
for establishing levels of presence is yet to be established (Hoffman, Hullfish, 
& Houston, in press). Held and Durlach (1992) propose that synthetic, 
computer generated environments might enhance the performance of 
humans operating remote robots. Sheridan (1992) speculates that presence 
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may improve sensori-motor or cognitive performance. While little is 
currently known about the phenomenon of presence, VRRV research is 
delving deeper into the potential benefits of immeision. 

III. External factors 

Numerous factors unrelated to the VR technology itself will 
undoubtedly have a crucial impact on students' learning achievement. These 
factors include differences in individual classroom environments, student 
characteristics such as personal history or attitudes towards computers, 
teachers' attitudes and background in technology, and an assortment of social, 
economic and political variables related to schools, education and technology. 
A comprehensive assessment of VR technology must take account of how 
these external factors contribute to the overall context in which VR is applied. 

3. THE VALUE OF AUTHENTIC ASSESSMENT: VALIDITY vs. RELIABILITY 

The challenge of assessing learning goes beyond determining the 
efficacy of a single technology; Assessment is inseparable from the broad goals 
of education. Scholastic measures which do not match classroom teaching 
lock students in a no-win situation. Measures must be valid and meaningful 
reflections of skills and knowledge that students can transfer from classroom 
to the world outside school. Meaningful assessment reflects meaningful 
instruction. 

A major rethinking of educational assessment has begun across the 
United States. Forty states are in the process of enacting legislation or 
developing new assessment standards (Pipho, 1992, cited in Taylor, 1994, 
p.234) . We must consider the evaluation of VR in the broad context of this 
educational reform. The new wave of standards includes performance 
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measures such as short-answer questions and student portfolios (Taylor, 

1994), Thus we also must develop new rubrics of educational efficacy which 
illuminate how VR can best fit into the new educational landscape. 

Traditional assessment has overemphasized test reliability at the 
expense of validity (Taylor, 1994; Moss, 1992; Linn, Baker & Dunbar, 1991; 
Wiggins, 1989). Measures of learning, particularly achievement tests, have 
almost exclusively been multiple-choice tests of declarative knowledge. 
Priority in testing has been given to test administration and reliability for 
reasons of convenience to the testers, but at the cost of students (Taylor, 1994; 
Sternberg, in press). The result is that current testing procedures give us little 
meaningful information about what children are learning and are capable of 
doing (Linn, Baker & Dunbar, 1991). This testing paradox is evident at every 
level of compulsory education, expressed in textbooks, curriculum and tests. 

Breaking free from this paradox will require changing both assessment 
practices and the content of curriculum. School experiences often fail to 
match the expectations of the real-world (Duffy & Jonassen, 1992). Numerous 
researchers (Resnick, 1987; Brown, Collins, and Duguid, 1989; Sherwood, 
Kinzer, Hasselbring, and Bransford, 1987) have pointed to these disparities as 
a major underlying cause of failure to transfer school-based learning. 

Traditional testing requires numerous inauthentic constraints as 
indirect proxies for performance to preserve validity (Wiggins, 1992). Typical 
artificial constraints include: access to reference materials, time restrictions, or 
limits to the prior knowledge of tasks and how they will be assessed. 
Constructivists, such as Jonassen and Duffy, attack such artificial testing 
constraints as ineffective techniques for measuring what is significant about 
student abilities. They believe "the critical aspect of performance is the ability 
to respond to the situation constraints - to be able to construct new plans 



13 



VRRV Assessment 



10 



based on the changing demands and constraints of the situation." (Duffy & 
Jonassen, 1992, p. 4) Thus testing in the constructivist paradigm is carried out 
in the closest approximation of the real-world performance environment as 
possible. Wiggins offers an interesting example of a more appropriate testing 
constraint: A physics teacher allows students to bring an index card to the 
exam with whatever notes they choose. The teacher collects the cards after the 
test, and notes that the content of the cards often reveals more about the 
students' knowledge than the exam answers (Wiggins, 1992, p. 31). 

The growing popularity of authentic assessment is pushing the 
development of measures which are valid reflections of students' ability and 
knowledge. However, authentic assessment does not merely mean using new 
methods to measure the same old learning. In his critique of science 
assessment, Shavelson, Baxter, & Pine (1991, p. 355) notes how performance* 
assessment approaches measure something significantly different about the 
scientific process than do traditional multiple choice tests. Instead of testing 
retention of verbal information, constructivist assessment tests the presence 
of more general indicators of learning such as mental models or the ability to 
construct plausible solutions to previously unencountered tasks. 

Cunningham (1992, p. 42) explains: 'We check to see if the student is 
developing self-awareness of the constructive process: the context-specific 
nature of interpretations, the value of multiple perspectives, the relativity of 
positions, etc." Constructivist assessment is often embedded in the learning 
process. 

Authentic assessment approaches have been criticized on the grounds 
that they are not reliable and are difficult to generalize across student 
populations. Some of these criticisms and possible solutions appear below. 
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This discussion of general trends in educational assessment is 
significant because it suggests a growing need to widely adopt performance 
assessment. Thus the assessment standards and methods chosen for VR must 
match with the broadly accepted practice in schools. Conversely, VR may offer 
a highly controllable testbed to enhance the quality and reliability of 
performance assessment. The power of VR as a tool for both experiencing 
prebuilt worlds and, more importantly, world building by students, suggests 
the technology will be widely applicable for education. It is crucial to consider 
VR performance assessment within the general context of authentic 
assessment because VR developers need to anticipate the overall educational 
environment in which the technology is to play a role. 



4. DEVELOPING A THEORETICAL PARADIGM FOR VR IN EDUCATION 

Because the theory underlying the design of assessment tasks 
inevitably shapes the final form of assessment, it is essential to clarify the 
theoretical basis for assessment from the outset. Further research and 
application of VR will benefit from a well developed and appropriate 
working paradigm for applying the technology in education. 

The information processing model of human cognition has long been 
the predominant paradigm in psychology, human-computer research, 
educational research and the field of assessment. Information processing has 
been heavily influenced by the computational model of cognition (Newell & 
Simon, 1972), especially in the study of human-computer interaction. 
According to information processing paradigm as stated by Lachman, 
Lachman and Butterfield (1979, p. 99), cognitive psychology and computers 
share a lot in common. "It (cognitive psychology] is about how people take in 
information, how they recode and remember it, how they make decisions. 
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how they transform their internal knowledge states, and how they translate 
these states into behavioral outputs." This paradigm stands firmly rooted in 
the objectivist tradition. 

Other information processing researchers such as Anderson (1983, 

1990) have enhanced the computational model to make it more relevant to 
education. Anderson's theory of Adaptive Control of Thought (ACT’^) 
moderates the information processing model to make it more applicable to 
describe learning. ACT'^ has enjoyed rather wide acceptance, yet ACT’^ does 
not address some of the key elements of learning deemed important in the 
constructivist paradigm such as student motivation and attitude. Nor is 
current information processing theory robust enough to describe highly 
complex, integrated learning as it often happens in the real world. 

Jonassen (1992, p.l38) charts the theoretical ideals of objectivism and • 
constructivism as polar opposites. He notes, however, that in reality 
instructional designers tend to fall somewhere in the middle of this 
continuum. 

objectivism < -PI ID ITS Piagetian >constructivism 

externally mediated reality internally mediated reality 

(PI: programmed instruction; ID: instructional design; ITS: intelligent tutoring systems) 



The conflict over the validity of the objectivist approach to instruction 
and learning assessment is at the crux of what sets these two approaches apart. 
Is the act of learning merely the completion of a set of processes, as 
information processing suggests? Or is learning the act of constructing parts 
into a greater, more meaningful whole? A complete assessment of the 
educational efficacy of VR requires supplementing the useful aspects of both 
the information processing and constructivist approaches. Following are 
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brief descriptions of the two paradigms. The purpose is to suggest what aspects 
of information processing may be appropriate to our assessment, and to 
clarify the unique aspects of constructivist assessment. 

4.1 Information Processing 

A main feature of the information processing approach is the emphasis 
on a well defined understanding of expert behavior. The target knowledge 
domain is established from the outset and assessment is based on how closely 
a novice student is able to approximate the competence of an expert. 
Competence as described by Glaser (1990, p. 30) has three major aspects: "(a) 
the compiled, automated, functional and proceduralized knowledge 
characteristic of a well-developed cognitive skill; (b) the effective use of 
internalized self-regulation control strategies for fostering comprehension; ’ 
and (c) the structuring of knowledge for explanation and problem solving." 

Anderson's (1983) ACT"^ model has been widely applied to computer- 
based training. The ACT’*' model is particularly relevant to learning 
assessment in VR because of its focus on higher cognitive skills. Anderson 
(1983) names three stages to describe the transition from novice to expert. 

Declarative Stage :, knowledge is stored as bits of declarative 
information 

Knowledge Compilation Stage : Transition of verbal information to 
more complete mastery, or skill level. This stage features 

Cottiposifion: Combining sets of steps into single steps which 
can be executed easily; 

ProccduraUzatio)i developing condition/action responses to 
stimulus or situations. 
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Procedural Stage : Streamlining the set of procedures and strengthening 
the processes. 

The ACT* paradigm calls for a cognitive task analysis for each task 
before training and testing the skill. 

Royer, Cisero, & Carlo (1993) published a survey of techniques for 
assessing higher cognitive skills based on the paradigm of Anderson's ACT* 
model. Their approach breaks information processing into three distinct 
layers: 1) basic capacities; 2) cognitive skills capable of being transformed from 
controlled to automatic/encapsulated processes; and 3) higher cognitive skills 
for goal setting and planning cognitive activity. Assessment at any of these 
layers requires determining the current stage of skill development, not 
simply if a certain skill has or has not been acquired. Royer, Cisero, & Carlo 
(1993, p. 207) also suggest a helpful framework for categorizing cognitive skill 
assessment techniques: 

Knowledge orgnnizntioti and structure: Storage as loosely related facts. 
Measure of knowledge organization and structure development is an 
indicator of higher cognitive skill. 

Depth of probleni representation: Perception of the problem as abstract 
principles. The novice perceives problems in terms of particular 
elements, not as a generalized set. The ability to perceive the principles 
underlying a problem is an index of skill development. 

Quality of mental models: The ability to imagine a system in operation. 
The model guides performance working within the domain. The 
presence and sophistication of mental models is a measure of skill 
development. 
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Efficiency of procedures: Eliminating unnecessary steps in solving a 

problem. The ability to efficiently use acquired skills is another index of 
growing skill development. 

Automaticity of performance: Efficient handling of cognitive load leaves 
room for extra processing of integrating information. Assessment tasks 
should systematically represent the critical performing a completely 
unrelated task. Automatic and capacity-free performance is a measure 
of skill development. 

Metacognitive skills: Ability to reflect on and control performance 
efficiently. The ability to plan activity, monitor outcomes and alter 
behavior accordingly demonstrates skill development. 

Figure 2 (Royer, Cisero, & Carlo, p. 1993, p. 209-10) is helpful for 
matching specific task types to target cognitive dimensions See Royer, Cisero 
and Carlo's text for a detailed explanation of each task. 

While the information processing paradigm offers a strong basis to 
analyze human-computer interaction, it is important to acknowledge that 
there are other paradigms through which to make assessment. In light of the 
weakness of current information processing theory to guide research in the 
creation of complex, integrated learning environments and to take factors 
such as attitude and motivation into account, assessment of educational VR 
would seemingly benclit from a broader and more robust paradigm of 
learning. 




figure 2: from Royer, Cisero, & Carlo, p. 1993, pp. 209-10. 



Cognitive Skill Assessment Techniques 
Cognitive Dimension Assessed 



Author 



Type of Task 



Development Level 
of Cognitive Skill 



Knowledge Acquisition 

Traditional assessment 

Ronan et al, 1976 Fireman tab test Declarative 

Lesgold & Lajoie, 1991 Recall of electronic components Declarative 



Knowledge Structure and 
Shepard, 1962 
Geeslin & Shavelson, 
1975 



Organization 
Multidimensional scaling 
Associative recall of concepts 



Chi et al, 1982 
Konold & Bates, 1982 
Konold & Bates, 1982 
Reitman & Rueter, 1980 
Adelson, 1981 
Gutherie, 1988 
Card et al, 1980 
Royer, 1990 
Carlo et al, 1992 



Conceptual recall of physics concepts 

Concept ratings 

Concept categorization 

Concept free recall 

Free recall of computer programs 

Document search 

Text editing 

SVT assessment 

Inferencing assessment 



Depth of Problem Representation 

Chase & Simon, 1973 . Chess perceptual reproduction 
Chase & Simon, 1973 Chess memory reproduction 
Egan & Schwartz, 1979 Reproduction of electronic circuits 
Barfield, 1986 Program recall 

Chi et al, 1981 Physics problem sorting 

Schoenfeld & Hermann, 



1982 

Carlo et al, 1992 
Adelson, 1984 
Adelson, 1984 
Goulet et al, 1989 
Allard et al, 1980 
Purkitt & Dyson, 1988 



Math problem judgments 
Classification of scientific principles 
Flowchart comprehension 
Insert missing line of program code 
Identification of tennis serves 
Recall of basketball positions 
Information usage in political 
decision making 



All levels 
All levels 

All levels 
All levels 
All levels. 
All levels 
All levels 
All levels 
All levels 
All levels 
All levels 



All levels 
All levels 
All levels 
All levels 
All levels 

All levels 
All levels 
All levels 
All levels 
All levels 
All levels 
All levels 




figure 2 continued: 
Author 



Type of Task 



Development Level 
of Cognitive Skill 



Mental Models 
McClosky et al, 1980 
Centner & Gen^^ner, 
1983 

Lopes, 1976 
J.R. Anderson, 1990 
Johnson, 1988 
Lesgold et al, 1988 



Prediction of flight path Declarative/Compilation 
Identifying underlying Declarative/Compiladon 
metaphors 

Poker mental models All levels 

Correct and buggy productions All levels 

Malfunctioning generator models All levels 

X-ray drawing All levels 



Metacognitive Skills 
Baker, 1989 
Rosenbaum, 1986 
Gerace & Mestre, 1990 
Lesgold et al, 1990 
Sweller et al, 1983 



Text faulting 
Visit planning 

Planning in physics problem solving 

Problem space planning 

Changes in problem solving strategy 



All levels 
All levels 
All levels 
All levels 
All levels 



Automaticity/Encapsulation of Performance 
Lesgold & Lajoie, 1991 Speed of conceptual processing 
Schneider, 1985 Dual task methodology 

Britton & Tesser, 1982 Dual task methodology 



All levels 
All levels 
All levels 



Efficiency of Procedures 

Glaser et al, 1985 Card sorting of assembly procedures All levels 

Lesgold & Lajoie, 1991 Multimeter judgment All levels 

Lesgold & Lajoie, 1991 Multimeter placement All levels 

Lesgold & Lajoie, 1991 Logic gate efficiency All levels 

Green & Jackson, 1976 Hark-back technique All levels 




VRRV Assessment 



16 



4.2 Constructivism 

At this time, the question of how to assess learning in the 
constructivist paradigm has gone largely unaddressed. Jonassen is one of the 
few who has attempted to outline what constructivist assessment might look 
like. 

As evaluators we need to focus on learning outcomes that will 
reflect the intellectual processes of knowledge construction. Clearly, 
knowledge construction entails higher order thinking. So, outcomes 
of constructivistic environments should assess higher order 
thinking, such as that at the "find" level of Merrill's (1983) 
taxonomy, the "cognitive strategy" level of Gagne's (1987), and the 
"synthesis" level of Bloom's taxonomy. 

(Jonassen, 1992, pp. 140-1). 

Thus assessment of learning in the constructivist paradigm can 
perhaps be evaluated with modified versions of existing taxonomies and 
strategies. Whatever methodology is chosen, it is clear that assessment must 
address both the process of knowledge acquisition as well as the final product. 
Toward this end, constructivists propose embedding assessment in the actual 
learning process. To do so is in sharp contrast to teaching and evaluation 
approaches which only test cumulative skills and knowledge after the 
learning process has been theoretically completed. 

Based on the constructivist conception that learning is an 
individualistic endeavor, Jonassen (1992) suggests that each individual 
learner may be the only one capable of interpreting his or her own progress. 
Therefore Jonassen believes that the evaluation of learning should be goal 
free relative to external criteria of success. But he also recognizes that 
constructivism needs to develop valid methodologies for assessment in order 
to gain wider acceptance. Jonassen cites Scriven (1973) for proposing needs- 
based assessment methods ns the most objective standards by which to 
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evaluate outcomes of any process. "Criterion-referenced instruction— where 
the goals of learning drive the instruction— and evaluation are prototypic 
objectivistic constructs and therefore not appropriate evaluation 
methodologies for constructivistic environments." (Jonassen, 1992, p.l40) 

Authentic tasks must be relevant to the real world relevance and 
utility of learning and should integrate knowledge across subject areas. 
"Simplified, decontextualized problems are inappropriate outcomes for 
constructivistic environments. So are they for evaluation, as well." (Jonassen, 
1992, p. 141). Jonassen offers some specific suggestions to describe— even if in 
only very sketchy, embryonic terms— characteristics of desirable assessment. 

- "Rather than learning being referenced by a single behavior or 
set of behaviors, it should be referenced by a domain of possible 
outcomes, each of which would provide acceptable evidence of 
learning." 

- Should have a panel of reviewers, each with a meaningful 
perspective and reasonable credentials. 

- A novice might provide a better evaluation than an expert, who 
frequently focuses on inappropriate criteria of learning. 

- Eva nation of multiple products or outcomes is preferable to 
assessing only a single one. 

- "Evaluation from a constructivistic perspective should be less of 
a reinforcement and/or behavior control tool and more of a self- 
analysis and metacognitive tool." 

(excerpted from Jonassen, 1992, pp. 143-5) 

General agreement is yet to reached on what types of knowledge 
domains are appropriate for constructivist teaching. Jonassen (1992) suggests 
that constructivistic learning environments are most appropriate for 
advanced knowledge acquisition, while it is likely that introductory 
knowledge acquisition is better supported by more objectivistic approaches. 
Fosnot (1992, p.l72) is critical of Jonassen's position. "In my mind, he 
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[Jonassen] has missed the main point of constructivism. Learners are always 
making meaning, no matter what level of understanding they are on. 
Constructivism is not a theory to explain only complex, ill-structured 
domains; it is a theory of how learners make meaning, period!. ..To assume 
the learner is a blank slate until presented with information, and to 
characterize experiences or tasks separate from the learner's meaning of them, 
is objectivistic— a perspective wluch in the first chapter Jonassen (& Duffy) so 
radically opposed!" Winn (1992, p. 179) expresses "I am not yet convinced that 
all knowledge can be constructed by students. The student has to have some 
knowledge from which to start construction. And that knowledge needs to be 
explicitly taught. Constructivists may well disagree with this." 

In summary, the constructivist paradigrh differs from information 
processing in a number of fundamental ways. Unlike information processing, 
constructivism considers factors of motivation and interest to be crucial to the 
learning process. Constructivism stresses integration of diverse knowledge, 
rather than reducing the complex "behaviors" of experts into subroutines, in 
terms of tasks for assessment, while information processing tasks are very 
often performance based, the tasks are defined for the student in very specific 
ways. Constructivist tasks are student centered — often student generated — 
and can result in a wide assortment of possible responses. 

VR may prove to be an optimal media for conducting constructivist 
assessment as well as instruction. The dynamic nature of the computer 
system allows recording of student interactions and data gathering in the 
background as the student moves through the virtual world. Once recorded, 
the record can be reviewed by the student to reconstruct and evaluate the 
learning process. Thus the application of VR as an assessment tool, in and of 
itself, is ..nother promising area for research. 
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5. CONDUCTING AUTHENTIC ASSESSMENT OF VR 

It is a common practice of authentic assessment to embed the test 
instrument into the learning process (Wiggins, 1989, 1992; Linn, Baker & 
Dunbar, 1991). Wiggins (1992) states that good assessment is good instruction. 
This point is crucial because it implies that the factors which contribute to 
good instruction are thetnselvcs the measurement tool for assessment. One 
example is the earlier mention of offering constructive feedback to the 
learner. The quality of feedback will influence learning. At the same time, 
student reliance on feedback can be interpreted as an indication of 
competence. This inter-relationship cannot be ignored when establishing 
assessment criteria and measures. 

When writing test questions, the questions themselves can serve as . 
exemplars of good teaching practices that are not likely to distort the teaching 
and learning process. Linn, Baker & Dunbar (1991, p. 16) suggest that 
questions should not be directly teachable; however, teaching for them will 
result in good instruction. Understanding the basis on which performance 
will be judged also promotes improved performance. 

Below is a list which includes a range of authentic assessment methods 
and approaches. Since it is beyond the scope of this paper to give in-depth 
discussions of the merits and. virtues of each, references have been included 
for each category to direct the reader to relevant sources. 

5.1 Problem solving 

Problem solving involves complex interactions between a multitude 
of cognitive, metacognitive and knowledge-based processes. Szetela and Nicol 
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(1992, pp. 43-4) break the problem solving process down into three stages: a) 
understanding the problem; b) solving the problem 

c) answering the question, and score performance on each one separately. 

This presents a more detailed picture of students' abilities than a simplistic 
approach such as measuring only correct and incorrect outcomes. Szetela and 
Nicol also identify the following typical sequence of actions for successful 
problem solving: 

1. Obtain appropriate representation of the problem situation 

2. Consider potentially appropriate strategies 

3. Select and implement a promising solution strategy. 

4. Monitor the implementation with respect to problem conditions and 
goals. 

5. Obtain and communicate the desired goals. 

6. Evaluate the adequacy and reasonableness of the solution. 

7. If the solution is judged faulty or inadequate, refine the problem 
representation and proceed with a new strategy or search for 
procedural or conceptual errors. 

When we consider these steps in terms of the characteristics of VR, a 
clear picture begins to emerge of how VR could aid student problem solving. 
Let us look at how VR matches with each of the above steps. 1) VR may 
prove to be a powerful visualization tool for representing abstract problem 
situations. 2) Virtual worlds allow for a high degree of trial and error, which 
may encourage students to e><plore a greater range of possible solutions. 3) 

The student is free to interact directly with virtual objects which allows for 
firsthand hypothesis testing. 4) The virtual world can be programmed to offer 
feedback which focus the student's attention on specific mistakes, thereby 
enhancing students' ability to monitor their own progress. 5) The VR system 
can collect and display comple.x data in real lime, which may help students 
obtain their desired g(.)als. o) The immersive nature of VR might enhance 
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students' capability to retain and recall information, which could facilitate the 
evaluation of solutions. 7) The virtual world is a fluid environment well 
suited for the iterative process of refinement. 

But the question remains as to how to evaluate students' progress 
along the steps presented above. Szetela and Nicol suggest six approaches for 
generating questions to stimulate and assess problem solving which are 
highly applicable to VR; (a) present a problem with all the facts and 
conditions, but have the students write an appropriate question, solve the 
completed problem and write their perceptions about the adequacy of the 
solution; (b) present a problem with a partial solution; (c) present a problem 
with unrelated facts, have students revise problem; (d) have students explain 
how they would solve a problem using only words, then do it; (e) after 
students solve a problem have them write a new one with different context • 
but preserving the original structure; and (f) present a problem without 
numerals. Students supply numbers, estimate answers and solve the problem 
themselves. 

Another assessment approach might be to have the students create 
their own evaluation method for worlds they have built. In other words, 
have students define the learning task and the criteria they would use to 
evaluate an individual's performance in their world. This process would 
require students to analyze what information is crucial in their worlds, and to 
generate their own problems which users would have to solve. 

5.2 Concept mapping 

Concept mapping is a process where students organize a domain of 
knowledge for themselves and express their understanding of the various 
inter-relationships in the form of a diagram (Novak & Gowin, 1984). Because 
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there are numerous ways to diagram any complex set of relationships there is 
no single "right" answer, making concept mapping an ideal instrument for 
authentic assessment. The change seen in students' maps from pre-treatment 
to post-treatment measures their learning and the sophistication of mental 
structures. 

Some educators view story maps as props which should be withdrawn 
as soon as possible; others see them as useful planning tools in preparation 
for synthesis activities (Quellmalz, 1991, p. 324). Typical criteria to assess the 
relative quality of concept maps include the appropriateness of the map to the 
content, content categories included in the map, the amount and quality of 
information portrayed, and the level of knowledge organization 
demonstrated. 

The example of the Nitrogen-Cycle World could be judged as a concept 
map, portraying the student's perception of relationships and processes in the 
cycle. Students develop an internal concept map during the world building 
process. Then they must figure out how to express their knowledge to others 
through the medium of the virtual world. While the technological 
complexity of VR may hamper students' ability with the medium, there is 
also a strong possibility for VR to open up a new avenue of innovation and 
expression. 

5.3 Metacognitive strategics 

There is substantial evidence which links the quality of metacognitive 
processing with development of knowledge structures (Butterfield, Albertson, 
& Johnston, 1993). Metacognitive components such as planning, self- 
monitoring, evaluation and reflection are assumed to be indicators of how 
closely students approximate the behavior or experts. Quellmalz (1991, p. 322) 
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uses a technique of having students give reflective accounts to explain what 
they have learned. The sophistication of the explanation indicates the 
development of knowledge formation. Another externally visible indicator of 
metacognition is the students' reliance on feedback and support while using 
an instructional program, i.e. in the virtual world. The term 'scaffolding' 
refers to the forms of assistance students require as they progress through the 
learning process. Scoring rubrics focus on the amount and nature of 
assistance required (Quellmalz, 1991, p. 324). 

5.4 Cooperative learning 

There is general consensus that students working in small groups 
produce higher achievement that students working alone, especially in a 
cooperative setting (Johnson , Johnson, & Stanne, 1985; Yager, Johnson, & - 

Johnson, 1985). The optimum size seems to be either two or three (Cox & 
Berger, 1985; Webb, Ender, & Lewis, 1986). There is also general consensus 
that paired students should be like-gendered and have similar abilities 
(Dalton, 1990; Dalton, Hannafin, & Hooper 1989; Johnson , Johnson & 

Stanne, 1985 Johnson , Johnson & Stanne, 1986). 

A common conception of VR, and computer technology in general, is 
that it isolates the user and reduces human interaction. One of the stated 
missions of the V^RRV project is to explore how VR can be used to enhance 
human interactions in a number of contexts. First, there are manv 
opportunities to encourage group collaboration within the design phase of 
world-building. Second, the experience of a single student in VR does not 
have to be conducted in isolation. Possibilities include interactions between a 
student immersed in a \irtual world and those outside, or the interaction 
between students \vatching another using VR. Finally, the VRRV Project has 
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the technological capability for two students to share the same virtual space 
and collaborate on a single task. While the a review of the literature on 
collaborative learning effects is beyond the scope of this report, I would like to 
mention two relevant studies of the educational effects of collaboration in 
computer-based training. 

Stephenson's (1991) study of computer-based training found that 
students benefited from teacher-student interaction of a social nature, and 
also through paired-learning arrangements. He also concluded that the 
relationship between students took the place of teacher-student interaction, 
since the most successful students were those who were in paired groups, 
followed by individuals who had high teacher-student interaction. 
Stephenson also found that weak students are more impacted by lack of social 
interaction than are strong students. These findings indicate that the one- 
student: one-computer model of computer-based training may be essentially 
flawed because it negates the social aspects af learning. 

Dalton (1990) found that it is not merely the presence of collaboration 
which contributes to learning, but the quality of the interactions which is the 
determining factor. He found that structured learner interactions aid 
encoding and cognitive process, and high-level elaboration (where students 
explain the content out loud) is the critical, beneficial factor of collaboration. 
Thus assessment of VR must, measure more than the frequency of 
interaction; it must measure the propensity of VR to stimulate meaningful 
and productive collaboration. 

These studies suggest that the VR technology which fosters 
collaboration will yield e\ en greater educational benefits. The question for 
research then becomes how to encourage meaningful collaboration both 
inside and outside virtual space? Attention must also be given to how to 
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train instructors to promote desirable interactions when using VR. 
Interestingly, if one establishes that the quality of student interactions is 
correlated with learning and performance achievement, then a measure of 
that quality becomes an indirect method of assessment. 

5.5 Intervieiv techniques 

Interviewing is a central technique for authentic assessment because of 
the value and emphasis placed on the experience of individual learners. 
Interviews may be open ended or highly structured depending on the type of 
assessment and the age of the subjects. In the process of explaining their 
thinking or learning process, students reveal more than if they can correctly 
answer test questions. The language and manner in which the student 
explains herself gives insight into how developed their cognitive models of • 
the domain are. Specific interviewing techniques include using probing 
questions, having the subject do free association, and video taping student 
performance then replaying the video while the subject recounts the 
experience (Suchman & Trigg, 1991). 

Role playing exercises can be a revealing element of interview or 
debriefing sessions. Kourisky (1983) reports facilitating instructor-led, inquiry- 
oriented discussion and role playing sessions as a means to focus students' 
attention. 

It is important to keep in mind that students may not be able to express 
their own ability and knowledge accurately to the interviewer. Some students 
may be better at performing an investigation to solve a problem than they are 
at verbally explaining the operations involved in an investigation. 
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5.6 Gathering data from performance tests in VR 

Some possible data gathering techniques to assess performance in a 
virtual environment include: video tape and analyze the subject's body 
movements in VR, observe quality and level of student interaction with the 
world, monitor the interaction between students watching someone 
experience VR, and monitor the amount and types of assistance the student 
requires to perform tasks. 

5.7 Reciprocal teaching 

Brown and Palincsar (1984, 1989; Glaser 1990) describe reciprocal 
teaching as an instructional procedure where "students take turns in leading 
the class in the use of strategies for comprehending and remembering text 
content that the teacher models for the class. Its three major components are 
(a) instruction and practice with executive strategies—questioning, 
summarizing, clarifying and predicting in the course of reading text— which 
enable students to monitor their understanding; (b) provision, initially by a 
teacher, of an expert model of these metacognitive processes; and (c) a social 
setting that enables joint negotiation for understanding." In addition to being 
a successful instructional practice, reciprocal teaching is also an effective 
device for assessment. As a student organizes and verbalizes her knowledge 
to teach another, the extent to which their understanding has developed 
becomes visible. "The Reciprocal Teaching method creates a zone of proximal 
development where learners perform within their range of competence 
while being assisted in realizing their potential levels of higher performance 
(Vygotsky, 1978)." (cited in Glaser, 1990, p.33). 
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Rosenshine and Meister (1994) have made a comprehensive review of 
reciprocal teaching research which should prove a useful guide for designing 
assessment. 

5.8 Conducting computer-based assessment 

In the current context, computer-based assessment refers to conducting 
assessment using a conventional PC platform to test transference of learning 
out of the virtual environment. Using flat-screen, computer simulations also 
offers an alternative computer environment for comparison with VR. 

Computer-based assessments have a well established track record and 
offer some attractive advantages over hands-on or paper-and-pencil testing 
methods. Automating with computers means assessment is less costly and 
time consuming to administer compared to hands-on or interview 
assessments. The computer maintains a full record of performance for easy 
review of problem solving process. Embedding assessment in a computer 
program can also offer advantages for the student and boost performance. For 
example, students can experiment with the technology to discover solutions 
to problems that are unavailable in other types of assessments. 

Nelson et al. (1993) describe methods for using data gathered by the 
computer as users move through a hypermedia system. Assessment can be 
based time spent on particular screens, the paths taken as the user moves 
from node to node within the system, or qualitative evaluation of social 
interactions matched with the record of human-computer interactions. These 
techniques apply to assessment of conventional multimedia, and could also 
be adapted for immersive VR. 

A study conducted by Kumar (1994) used a HyperCard si.ack to assess 
learning. He found that HyperCard and pen-and-paper assessment methods 
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influenced the performance of expert and novice students differently in tasks 
to balance chemical equations. In a test of learning in high school chemistry, 
Kumar found that students scored significantly higher using a computer than 
with pen-and-paper. Novices using HyperCard actually did as well as experts 
with pen-and-paper! Kumar credits the advantage to the computer's ability to 
remember for the students, which reduces their overall cognitive load. The 
computer also give immediate feedback which improves motivation and 
attention to the assessment task. Hypermedia can provide a non-linear 
environment for problem solving to allow the transfer of knowledge across 
domains (Kumar 1994, p. 64). Kumar's study is a good illustration of how a 
test can become a teaching tool. 

Some potential dangers in using hypermedia for assessment should be 
mentioned here. Researchers have found that it can be difficult to keep 
students on task in large hypermedia systems; students may become 
disoriented within the program (Kumar 1994); and there may be a gender bias 
favoring males (Clarke, 1990). For detailed discussion of how and why to use 
computer based assessment approaches see Shavelson, Baxter, & Pine (1991) 
and Kumar (1994). 

5.9 The effect of VR ou other behavior 

Assessment should not overlook possible residual benefits and 
changes resulting from the introduction of VR into the classroom. Potential 
areas for study include; (a) increased use of computers, (b) changes in student 
self-image and confidence, (c) implications of technology elsewhere in the 
classroom, and (d) carry over to other areas of student interest. 
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6. ANALYZING PERFORMANCE 

In addition to creating valid tasks, we must also conduct valid analysis 
of the data. Reeves (1986, 1992) is a sharp critic of the outcome of most 
experimental and quasi-experimental designs in education. His review of the 
literature found that few research and evaluation efforts have reported any 
statistically or educationally significant differences (Reeves, 1986, p. 102). 

Winn (Winn, 1993) cautions that "...instructional designers are wrong to 
assume that they can base instructional strategies on the analysis of an 
objective, standard world... evaluation of learning can only tell us what 
students appear, or pretend to know, not what they really know." (Winn, 
1993). 

Reeves (Reeves, 1986, p. 103) suggests the need for a new paradigm of 
assessment to draw more meaningful conclusions about educational media.- 
His two step approach to monitor the assessment process is as follows: 

Step 1: measure differences in: 

a) initial characteristics of learners 

b) contextual variables 

c) dimensions of the instructional treatment 

d) criteria or outcomes. 

Step 2: Analyze measured differences in terms of: 

a) How much variance in outcomes can be uniquely attributed to 
each of the predictor domains (student initial abilities, context 
and treatment) 

b) How much variance can be attributed to interactions among 
the predictor domains? 
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The measurement of cognitive gains via constructing a causal model of 
critical dimensions of VR which influence learning outcomes is based in the 
information processing paradigm; the antithesis of constructivism. Reeves 
suggests basing such a causal analysis on Gagne's (1974) nine events of 
instruction which is heavily based on the assumptions of the computational 
model. An attempt to construct such a model may indeed prove helpful in 
understanding VR, and to ground the study of this new technology in the 
proven and accepted legacy of the old. It is important to note, however, that 
such an exercise would mean little when viewed from the constructivist 
perspective. 

7. THREATS TO VALIDITY AND RELIABILITY 

Shavelson, Baxter and Pine (1991) examine these criticisms and 
conclude that authentic assessment approaches can yield reliable results if 
each hands-on investigation is treated individually, with the obvious 
disadvantage that such procedures are far more time and labor intensive than 
traditional paper-and-pencil examinations. Authentic testing methods are 
also delicate instruments which require fine tuning and great care in 
administration. Inter-observer consistency is one of the major threats to 
reliability for many strategies (Kazdin, 1982). Authentic tasks and tests are 
often extremely heterogeneous: some are more difficult than others and they 
can vary widely in the specific knowledge-domain which they assess. Test 
results show that individual student performance can vary dramatically on 
similar test items and tasks. Many tests may also be biased toward students 
with previous experience in hands-on learning. Another criticism is that 
techniques such as self-reporting or interviews rely too heavily on an 
individual's verbal and communication abilities as an information source. 
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Perhaps most importantly, Shavelson, Baxter and Pine (1991, p. 32) note that 
"a substantial number of assessment tasks are needed to generalize, with any 
degree of confidence, from students observed performances to the science 
domain of interest." 

Educational assessment involves countless factors which could disrupt, 
alter or invalidate data collection that researchers in the physical sciences 
never need to address. Some of these problems can be attributed to the nature 
of working with human subjects, others to the environment of school 
administration and classrooms. The literature on assessment contains 
substantial warnings of potential pitfalls which are worthy of noting. 

One of the primary concerns in conducting complex assessment is to 
insure consistency across treatments and the rating of student performance. 

To guard against inter-observer error, conduct trial assessments using video* 
examples of sample subject performance to train assessment administrators 
(Blumberg et al., 1986; Suchman & Trigg, 1991). Administrators should 
practice with the tape and compare their results until agreement on scoring is 
reached. Wiggins (1992) suggests developing a detailed protocol of how tasks 
should be administered to insure that judges will know the proper limits of 
their interventions to student acts, comments or questions. He notes how 
easy it is to completely invalidate a study's results with inconsistencies. 

If assessment relies on- classroom teachers making and recording 
observations, it is helpful to make tasks maximally self-sustaining and the 
record-keeping obligation mostly the students'. Systematization and 
automation of the assessment process will free the teacher to focus on more 
valuable judgments (Wiggins, 1992). 

Ogborn (1994) makes a number of cogent cautions regarding the design 
and exploration of learning environments. He points out some difficulties in 
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designing tasks for testing expressive, as opposed to exploratory, use of 
software. Task goals must be concise and clearly explained to the user. Also, 
ample time must be allowed so the user progresses beyond mastering the 
interface to focusing on the content of the task. Ogborn criticizes much 
research for expecting to achieve learning gains with unrealistically short 
treatment times. “Most worthwhile learning takes a good long time to 
achieve, best measured in weeks or months than in days or hours." (Ogborn 
(1994, p. 35). 

Gender bias is one potential confounding factor in educational 
assessment, particularly in research related to technology. Clarke (1990) 
advises researchers to take account of external influences which may create 
gender effects when developing test questions. For example, he found that 
test questions which involved female-stereotyped activities such as 
determining the most effective flooring for a kitchen did not engage some 
boys. 

Specific problems may arise in certain domains of knowledge do to 
students' preconceived notions and attitudes. Clarke (1990) found students' 
views of what is or is not "science" are shaped by personal experience. 
Consequently, students may reformulate an assessment task to fit their 
perception of science and proceed to solve the problem in ways incompatible 
with those intended. 

Researchers must also be cautious of the influence of developmental 
changes and age specific phenomena on research results. The method in 
which assessment activities are administrated must be consistent across all 
age groups to take account of developmental changes in problem-solving. 
This will also help determine which activities are inappropriate for a given 
age group. 
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Another potential source of confounding variables can generally be 
characterized under the heading of learner types. That is, specific learner 
characteristics such as prior knowledge, general aptitude, gender, learning 
style, socio-economic background or previous experience with technology 
might significantly influence learning with VR for specific students. 

While it is beyond the scope of this report to even begin to address the 
numerous individual differences worthy of study, let us look as the single 
characteristic of field dependent versus field independent learners as a case in 
point. A significant number of studies (Frank and Keene, 1993; Davis & 
Cochran, 1989; Frank, 1983) suggest a significant distinction between field 
dependent and independent learning styles. The construct of field 
independence-depenoence refers to the stable and pervasive preference of 
individuals for either analytical or global information processing. Field- 
independent individuals are strong in perceptual and conceptual tasks, 
actively segmenting information into relevant parts and analyzing the 
interrelationships among those parts. Field-dependent individuals process 
information in a global, holistic, and passive fashion; their processing tends 
to be dominated by the existing organization of the perceptual and cognitive 
field (Goodenough, 1976). 

Future research in VR might be to examine ways to encourage field- 
dependent students to use a more active and flexible style of information 
processing. This training could focus on developing a range of skills 
including metacognitive awareness, mathemagenic memory strategies (i.e. 
elaboration, categorization, thematic organization), or incorporate Vygotsky's 
(1978) concept of the proximal zone of development within cooperative 
group training activities (Johnson & Johnson, 1987; Slavin, 1986). VR could be 
a vehicle to encourage active processing strategies for field-independent 
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students by offering direct, physical interaction and manipulation of abstract 
content. 

8. CONCLUSION 

A comprehensive evaluation of the educational efficacy of VR must 
take account of all three factor areas for assessment: instructional, experiential 
and external. Meaningful assessment requires robust rubrics and standards in 
order to illuminate the unique aspects of VR. Student performance with the 
technology should be observed and rated over an extended period of time and 
include the learning process, not merely ? single test of outcome. Assessment 
procedures must be relevant to content area. When assessment is embedded 
in the learning process, it is important to clarify the distinction between 
individual factors, such as feedback or cooperative learning, which can be 
both an independent variable of instruction or an assessment measure. 

Considering the incomplete nature of the field at this time, the key to 
conducting meaningful assessment will be to apply multiple measures of 
learning and performance. Reciprocal teaching and open ended interview 
techniques will yield the greatest bounty of data, but these methods suffer 
from being labor intensive and weak at yielding quantifiable comparisons. 
Perhaps the most promising form of assessment will be to use the computer 
to capture motions and interactions, which significantly speeds data collection 
and can also become a basis for students to recount their experiences. A 
variety of intervie^v techniques such as role playing will enhance the 
interview process, especially for young children. Well designed instructional 
software which mimics the virtual world will be good tests of transference, 
and will also enable automated data collection for assessment. 
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In the case of assessing the world building process, it may be beneficial 
for students to formulate their own evaluation methods. The process of 
stating criteria for successful completion of a worlds, stimulates reasoning 
and problem solving skills, encourages students to teach and test one another, 
demonstrates that students grasp fundamental and critical knowledge, and 
reinforces learning. This practice follows the constructivist paradigm through 
student centered learning, embedding assessment into the learning process, 
and allowing for open ended outcomes tailored to individual students. 

Tests of complex levels of cognition such as problem solving, building 
mental models and metncognition will need to be adapted to fit the nature of 
VR. Tasks must be not only engaging for the students, they must address the 
unique, immersive nature and interactive aspects of VR so as to distinguish 
the level of learning directly attributable to the technology. As a general 
principle, research and development of VR should strive to encourage greater 
human-human collaboration and interaction, possibly using the level and 
quality of this interaction as a measure of success. 

Research using VR is susceptible to every validity and reliability 
confound in conventional assessment, plus a whole new set related to the 
technology. Thoughtful application of theory to practice should reveal the 
potential. 
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